Improvement in Sanitation facilities across the world

Author

RaajMohan Rajabaksh

Code
import pandas as pd
Code
gapminder =pd.read_csv("/content/drive/MyDrive/dataset/unicef_metadata.csv")
Code
from plotnine import *

Introduction

Access to adequate sanitation is a fundamental human right and a cornerstone of public health, yet it remains a significant challenge worldwide. As of 2022, only 57% of the global population-about 4.6 billion people-used safely managed sanitation services, while over 1.5 billion lacked basic facilities like private toilets or latrines. Alarmingly, 419 million people still practice open defecation, exposing communities to severe health risks and environmental contamination. Disparities are stark: nearly two-thirds without basic services live in rural areas, mostly in sub-Saharan Africa and parts of Asia. Poor sanitation is linked to diseases like diarrhoea, a leading cause of death among children under five, and contributes to malnutrition and lost economic productivity. While progress has been made-2.5 billion gained improved sanitation since 2000-advancement is uneven, especially in low-income countries. Achieving universal access by 2030, per the Sustainable Development Goals, requires accelerated efforts, investment, and innovation worldwide.

Code
# Average observation per year
df = gapminder.groupby('year')['Life expectancy at birth, total (years)'].mean().reset_index()
mean_per_year = df.groupby('year')['Life expectancy at birth, total (years)'].mean().reset_index()
import seaborn as sns

# Plot it
import matplotlib.pyplot as plt
plt.plot(mean_per_year['year'], mean_per_year['Life expectancy at birth, total (years)'], color="black")
plt.title('Average Life expectancy', fontweight="bold")
plt.xlabel('Year', color="red", fontweight="bold")
plt.ylabel('Life expectancy', color="red", fontweight="bold")
sns.regplot(x='year', y='Life expectancy at birth, total (years)', data=df, lowess=True, scatter=False, color='red', line_kws={'linestyle': 'dotted', 'color': 'red'})
plt.tick_params(axis='x', colors='red')  # x-axis ticks
plt.tick_params(axis='y', colors='red')  # x-axis ticks
plt.grid(False)
plt.yticks(rotation=90)



plt.show()



Life Expectancy

This graph displays the trend of average life expectancy across all countries over the years. By calculating the mean life expectancy for each year in the dataset, the visualization highlights how global health and longevity have changed over time.

An upward trend in the graph typically indicates improvements in healthcare, nutrition, sanitation, and living conditions worldwide. Conversely, periods of stagnation or decline may reflect historical events such as wars, pandemics, or economic crises that negatively impacted population health.

Overall, this visualization provides valuable insight into the progress and challenges in increasing life expectancy at a global scale.

Code
file = gpd.read_file("/content/drive/MyDrive/dataset/unicef_metadata.csv")
import geopandas as gpd
import folium
import mapclassify
from matplotlib import pyplot as plt
import pandas as pd
import plotly.express as px

# Load the dataset
file_path = '/content/drive/MyDrive/dataset/indicator 1.csv'  # Replace with your file path
df = pd.read_csv(file_path)

# Rename 'obs_value' to 'sanitation' for clarity
df = df.rename(columns={'obs_value': 'sanitation'})

# Convert 'time_period' to numeric
df['time_period'] = pd.to_numeric(df['time_period'], errors='coerce')

# Filter for years >= 2000 AND drop NaN values in specified columns
df_cleaned = df[df['time_period'] >= 2000].dropna(subset=['country', 'sanitation', 'time_period', 'alpha_3_code'])

# Find the first year >= 2000 AFTER filtering
min_year = int(df_cleaned['time_period'].min())

# Define a red color scale
color_scale = px.colors.sequential.Reds

# Create the choropleth figure
fig = px.choropleth(
    df_cleaned,
    locations="alpha_3_code",          # ISO alpha-3 country codes
    color="sanitation",                # Sanitation percentage for color fill
    hover_name="country",              # Display country name on hover
    hover_data={
        'time_period': True,           # Show year
        'alpha_3_code': True,          # Show country code
        'sanitation': ':.2f'           # Show sanitation with 2 decimals
    },
    animation_frame="time_period",     # Animate by year
    color_continuous_scale=color_scale,
    projection="orthographic",         # Globe-like projection
    title="Sanitation Across the World Over Time"
)

# Customize the geographic layout
fig.update_geos(
    showocean=True,
    oceancolor="#0B9ED2",
    showland=True,
    landcolor="#C19A6B",
    showcountries=True,
    countrycolor="grey",
    showcoastlines=True,
    coastlinecolor="black",
    showframe=True,
    framecolor="black",
    projection_scale=1  # Zoom level (1 is default)
)

# Update layout for better appearance
fig.update_layout(
    title=dict(
        text="Sanitation Across the World Over Time",
        x=0.5,
        xanchor='center',
        font=dict(size=24, weight="bold")
    ),
    height=600,
    margin=dict(r=0, l=0, b=0, t=50),
    coloraxis_colorbar=dict(
        title="Sanitation (%)",
        ticksuffix="%",
    ),
    sliders=[{
        'active': 0,
        'pad': {'b': 10, 't': 50},
        'steps': [{
            'label': str(year),
            'method': 'animate',
            'args': [[str(year)],
                     {'frame': {'duration': 300, 'redraw': True},
                      'mode': 'immediate',
                      'transition': {'duration': 0}}
                    ]
        } for year in sorted(df_cleaned['time_period'].unique())],
        'currentvalue': {
            'prefix': 'Year: ',
            'visible': True,
            'xanchor': 'right'
        },

    }],
    hoverlabel=dict(  # NEW: Customize the hover labels
        bgcolor="white",
        font_size=12,
        font_family="Rockwell"
    )
)

fig.update_traces(
    hovertemplate="<b>%{hovertext}</b><br><br>" +  # Country name in bold
                  "Year: %{customdata[0]}<br>" +
                  "Country Code: %{customdata[1]}<br>" +
                  "Sanitation: %{customdata[2]:.2f}%<extra></extra>",
    customdata=df_cleaned[['time_period', 'alpha_3_code', 'sanitation']].values
)

# --- Save the interactive plot as an HTML file ---
fig.write_html("sanitation_world.html")

# Optionally, still show it in your notebook or browser
fig.show()
fig.write_html("sanitation_world.html", full_html=True, include_plotlyjs='cdn')

Global Sanitation Health

While many high-income countries have achieved near-universal sanitation coverage, significant gaps persist across low- and middle-income nations. Global efforts aim to improve sanitation access, reduce health risks, and promote sustainable development as outlined in the United Nations’ Sustainable Development Goals (SDG 6). Visualizing sanitation data alongside economic indicators like GDP per capita highlights the strong link between national wealth and access to essential sanitation services.

Code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
file_path = '/content/drive/MyDrive/dataset/indicator 1.csv'
df = pd.read_csv(file_path)

# Your continent mapping
continent_map = {
    'AF': 'Asia', 'DZ': 'Africa', 'AD': 'Europe', 'AI': 'North America', 'AG': 'North America',
    'AM': 'Asia', 'AU': 'Oceania', 'AT': 'Europe', 'AZ': 'Asia', 'BH': 'Asia', 'BD': 'Asia',
    'BB': 'North America', 'BY': 'Europe', 'BE': 'Europe', 'BZ': 'North America', 'BT': 'Asia',
    'BO': 'South America', 'BR': 'South America', 'VG': 'North America', 'BN': 'Asia', 'KH': 'Asia',
    'KY': 'North America', 'CF': 'Africa', 'CL': 'South America', 'CN': 'Asia', 'HK': 'Asia',
    'MO': 'Asia', 'CO': 'South America', 'CK': 'Oceania', 'CR': 'North America', 'HR': 'Europe',
    'CU': 'North America', 'CZ': 'Europe', 'DK': 'Europe', 'DJ': 'Africa', 'DM': 'North America',
    'EC': 'South America', 'EG': 'Africa', 'SV': 'North America', 'EE': 'Europe', 'ET': 'Africa',
    'FJ': 'Oceania', 'FI': 'Europe', 'FR': 'Europe', 'GA': 'Africa', 'GE': 'Asia', 'DE': 'Europe',
    'GH': 'Africa', 'GI': 'Europe', 'GD': 'North America', 'GW': 'Africa', 'HT': 'North America',
    'HN': 'North America', 'HU': 'Europe', 'IN': 'Asia', 'ID': 'Asia', 'IQ': 'Asia', 'IL': 'Asia',
    'IT': 'Europe', 'JO': 'Asia', 'KW': 'Asia', 'LA': 'Asia', 'LV': 'Europe', 'LB': 'Asia',
    'LR': 'Africa', 'LY': 'Africa', 'LT': 'Europe', 'LU': 'Europe', 'MW': 'Africa', 'MY': 'Asia',
    'ML': 'Africa', 'MT': 'Europe', 'MU': 'Africa', 'MX': 'North America', 'FM': 'Oceania',
    'MC': 'Europe', 'MN': 'Asia', 'ME': 'Europe', 'MS': 'North America', 'MZ': 'Africa', 'NA': 'Africa',
    'NR': 'Oceania', 'NP': 'Asia', 'NL': 'Europe', 'NI': 'North America', 'NG': 'Africa', 'NU': 'Oceania',
    'NO': 'Europe', 'OM': 'Asia', 'PW': 'Oceania', 'PA': 'North America', 'PG': 'Oceania', 'PY': 'South America',
    'PE': 'South America', 'PH': 'Asia', 'PL': 'Europe', 'PT': 'Europe', 'QA': 'Asia', 'KR': 'Asia',
    'MD': 'Europe', 'RU': 'Europe', 'RW': 'Africa', 'KN': 'North America', 'LC': 'North America',
    'VC': 'North America', 'WS': 'Oceania', 'SM': 'Europe', 'SA': 'Asia', 'RS': 'Europe', 'SC': 'Africa',
    'SL': 'Africa', 'SG': 'Asia', 'SK': 'Europe', 'SI': 'Europe', 'SB': 'Oceania', 'SO': 'Africa',
    'ES': 'Europe', 'PS': 'Asia', 'SD': 'Africa', 'CH': 'Europe', 'TJ': 'Asia', 'TN': 'Africa',
    'TM': 'Asia', 'TC': 'North America', 'TR': 'Asia', 'UG': 'Africa', 'UA': 'Europe', 'AE': 'Asia',
    'TZ': 'Africa', 'US': 'North America', 'UY': 'South America', 'VU': 'Oceania', 'YE': 'Asia',
    'ZW': 'Africa'
}

# Map continents and clean data
df['continent'] = df['alpha_2_code'].map(continent_map)
df = df.dropna(subset=['continent', 'obs_value', 'time_period'])
df['time_period'] = df['time_period'].astype(int)
df['obs_value'] = pd.to_numeric(df['obs_value'], errors='coerce')

# Group by continent and year, take the mean
df_grouped = df.groupby(['continent', 'time_period'], as_index=False)['obs_value'].mean()

# Plot
sns.set(style="whitegrid")
plt.figure(figsize=(14, 8))
sns.lineplot(
    data=df_grouped,
    x='time_period',
    y='obs_value',
    hue='continent',
    marker='o'
)

plt.title('Proportion of Schools with Improved Sanitation by Continent', fontweight="bold", fontsize=18)
plt.xlabel('Year', color="red", fontweight="bold", fontsize=14)
plt.ylabel('Proportion of Schools with Improved Sanitation (%)', color="red", fontweight="bold", fontsize=14)
plt.legend(title='Continent', fontsize=12, title_fontsize=13)
plt.tick_params(axis='x', colors='red', labelsize=12)
plt.tick_params(axis='y', colors='red', labelsize=12)
plt.grid(False)
plt.tight_layout()
plt.show()

Sanitation Levels over Continents

Over the past two decades, significant progress has been made in improving access to sanitation facilities in schools across all continents. Driven by international initiatives such as the United Nations’ Sustainable Development Goals (SDG 6), governments and organizations have prioritized investments in clean water and sanitation infrastructure within educational settings. This progress has been crucial in promoting better health, reducing the spread of disease, and increasing school attendance, especially among girls who are disproportionately affected by the lack of proper sanitation.

While notable improvements have been observed in regions such as East Asia, Europe, and North America, many schools in Sub-Saharan Africa and parts of South Asia continue to face challenges in providing adequate sanitation facilities. The data highlights a global trend toward progress, but also underscores persistent inequalities between and within regions. Ongoing efforts are essential to ensure that all children, regardless of where they live, have access to safe, private, and hygienic sanitation at school — a fundamental right that supports both health and educational outcomes.

Code
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
file_path = '/content/drive/MyDrive/dataset/indicator 1.csv'  # Replace with your actual file path
df = pd.read_csv(file_path)

# Clean the data: Ensure 'country', 'time_period', and 'obs_value' are present and valid
df_cleaned = df.dropna(subset=['country', 'time_period', 'obs_value']).copy()

# Ensure 'time_period' is numeric
df_cleaned['time_period'] = pd.to_numeric(df_cleaned['time_period'], errors='coerce')
df_cleaned = df_cleaned.dropna(subset=['time_period'])

# Ensure 'obs_value' is numeric
df_cleaned['obs_value'] = pd.to_numeric(df_cleaned['obs_value'], errors='coerce')
df_cleaned = df_cleaned.dropna(subset=['obs_value'])

# Filter for the specified countries
countries_of_interest = ['Bangladesh', 'Bhutan', 'Malawi', 'Haiti', 'India', 'Sierra Leone']
df_filtered = df_cleaned[df_cleaned['country'].isin(countries_of_interest)]

# Create the plot
plt.figure(figsize=(12, 7))

# Plot a line for each country
for country in countries_of_interest:
    country_data = df_filtered[df_filtered['country'] == country]
    plt.plot(
        country_data['time_period'],
        country_data['obs_value'],
        label=country
    )

# Customize the plot
plt.title('Top Countries with Largest Improvement in Sanitation', fontsize=20, fontweight='bold')

plt.xlabel('Year', color='red', fontsize=14, fontweight='bold')
plt.ylabel('Sanitation (%)', color='red', fontsize=14, fontweight='bold')

plt.xticks(color='red', fontsize=12)
plt.yticks(color='red', fontsize=12)

plt.legend(title='Country')
plt.grid(True, linestyle='--', alpha=0.5)
plt.grid(False)
plt.tight_layout()
plt.show()

Largest Improvements

Over the past several years, notable progress has been made in improving sanitation facilities in schools across Bangladesh, Bhutan, Haiti, India, Malawi, and Sierra Leone. These countries have demonstrated a strong commitment to enhancing hygiene standards in educational settings, often through targeted national programs and international partnerships aimed at expanding access to clean water and safe sanitation.

The improvements in school sanitation have contributed to better health outcomes, reduced absenteeism, and more inclusive learning environments, particularly benefiting girls who are disproportionately affected by inadequate facilities. While challenges still remain, the advancements seen in these countries serve as an encouraging example of how focused investment and policy action can drive substantial progress toward achieving universal access to basic sanitation in schools.

Code


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
from IPython.display import display, HTML

# Load the dataset
file_path = '/content/drive/MyDrive/dataset/unicef_metadata.csv'
df = pd.read_csv(file_path)

# Rename columns for easier handling
df = df.rename(columns={
    'GDP per capita (constant 2015 US$)': 'GDP',
    'Life expectancy at birth, total (years)': 'life-expectancy'
})

# Group by country to calculate average GDP and life expectancy
avg_data = df.groupby('country').agg({
    'GDP': 'mean',
    'life-expectancy': 'mean'
}).reset_index()

# Assign a unique color to each country
countries = avg_data['country'].unique()
colors = plt.cm.tab20(np.linspace(0, 1, len(countries)))
country_color_dict = dict(zip(countries, colors))

# Create the scatter plot and save to a buffer
fig, ax = plt.subplots(figsize=(12, 12))
for country in countries:
    subset = avg_data[avg_data['country'] == country]
    ax.scatter(
        subset['GDP'],
        subset['life-expectancy'],
        label=country,
        color=country_color_dict[country],
        s=80,
        edgecolor='black'
    )

ax.set_title('Average GDP per Capita vs. Average Life Expectancy', fontsize=24, fontweight='bold', color='black')
ax.set_xlabel('GDP per Capita (constant 2015 US$)', fontsize=16, color='red', fontweight='bold')
ax.set_ylabel('Life Expectancy at Birth (years)', fontsize=16, color='red', fontweight='bold')
ax.tick_params(axis='x', colors='red', labelsize=12)
ax.tick_params(axis='y', colors='red', labelsize=12)
ax.grid(False)  # Remove grid lines
plt.tight_layout()

# Save the plot to an image buffer
import io
buf = io.BytesIO()
plt.savefig(buf, format='png', bbox_inches='tight')
plt.close(fig)
buf.seek(0)

# Create a scrollable HTML legend with title and extra info
legend_html = """
<div style='font-weight:bold; font-size:18px; margin-bottom:10px; text-align:center;'>Countries</div>
<div style='height:500px; overflow:auto; border:1px solid #ccc; padding:10px; width:350px;'>
"""
for country in countries:
    rgb = tuple((country_color_dict[country][:3]*255).astype(int))
    hex_color = '#%02x%02x%02x' % rgb
    row = avg_data[avg_data['country'] == country].iloc[0]
    legend_html += (
        f"<div style='margin-bottom:10px;'>"
        f"<span style='display:inline-block; width:18px; height:18px; background:{hex_color}; margin-right:8px; border:1px solid #888;'></span>"
        f"<b>{country}</b><br>"
        f"GDP: {row['GDP']:.2f}<br>"
        f"Life Exp.: {row['life-expectancy']:.2f}"
        f"</div>"
    )
legend_html += "</div>"

# Display plot and legend side by side
img_widget = widgets.Image(value=buf.getvalue(), format='png', width=700, height=500)
legend_widget = widgets.HTML(value=legend_html)

hbox = widgets.HBox([img_widget, legend_widget])
display(hbox)
Code
import pandas as pd
import plotly.express as px

# Load the dataset
file_path = '/content/drive/MyDrive/dataset/unicef_metadata.csv'
df = pd.read_csv(file_path)

# Rename columns for easier handling
df = df.rename(columns={
    'GDP per capita (constant 2015 US$)': 'GDP',
    'Life expectancy at birth, total (years)': 'life-expectancy'
})

# Group by country to calculate average GDP and life expectancy
avg_data = df.groupby('country').agg({
    'GDP': 'mean',
    'life-expectancy': 'mean'
}).reset_index()

# Create the interactive scatter plot
fig = px.scatter(
    avg_data,
    x='GDP',
    y='life-expectancy',
    color='country',
    hover_name='country',
    hover_data={
        'GDP': ':.2f',
        'life-expectancy': ':.2f'
    },
    title='Average GDP per Capita vs. Average Life Expectancy'
)

fig.update_layout(
    legend_title_text='Countries',
    legend=dict(
        title_font=dict(size=18),
        font=dict(size=12),
        orientation="v",
        yanchor="top",
        y=1,
        xanchor="left",
        x=1.02,
        bgcolor="white",
        bordercolor="black",
        borderwidth=1,
        itemsizing='trace'
    ),
    xaxis_title=dict(text='GDP per Capita (constant 2015 US$)', font=dict(color='red', size=16, family='Arial')),
    yaxis_title=dict(text='Life Expectancy at Birth (years)', font=dict(color='red', size=16, family='Arial')),
    title_font=dict(size=24, family='Arial', color='black'),
    margin=dict(l=40, r=200, t=60, b=40)
)

fig.show()

How does GDP affect countries

There is a strong and well-documented positive relationship between a country’s GDP per capita and the average life expectancy of its population. Generally, nations with higher GDP per capita tend to offer better healthcare services, improved sanitation, higher educational attainment, and greater access to nutritious food, all of which contribute significantly to longer life spans.

Countries such as Japan, Switzerland, and Australia, which enjoy high levels of economic prosperity, report some of the highest life expectancies in the world. In contrast, countries with lower GDP per capita, particularly in parts of Sub-Saharan Africa and South Asia, often face barriers such as inadequate healthcare infrastructure, higher rates of infectious diseases, and limited public health resources, resulting in shorter average lifespans.

However, the relationship is not purely linear. Several nations demonstrate that effective public health interventions, education, and social policies can lead to relatively high life expectancy even without extremely high GDP levels. Examples include countries like Costa Rica and Vietnam, where proactive healthcare initiatives and community-based programs have led to remarkable health outcomes despite modest economic means.

This analysis underscores that while economic growth is a critical driver of health improvements, policy decisions, equitable access to services, and social investments also play pivotal roles in extending and improving quality of life.

Conclusion

This project has highlighted critical global patterns in sanitation access, economic development, and life expectancy. By visualizing and analyzing data across countries and continents, we observed that improvements in sanitation, particularly within schools, have been substantial but uneven, with nations such as Bangladesh, Bhutan, Haiti, India, Malawi, and Sierra Leone showing notable progress. At the same time, broader comparisons revealed a strong positive relationship between GDP per capita and life expectancy, underscoring the profound impact of economic prosperity on public health outcomes.

However, the analysis also demonstrated that effective public health policies, education, and targeted investments can lead to significant improvements even in lower-income countries. These findings reinforce the importance of combining economic growth strategies with social development programs to promote equitable and sustainable progress worldwide.

Ultimately, the project emphasizes that while global efforts are yielding positive results, continued focus on equity, access, and local context-specific solutions remains essential to closing the gaps and ensuring a healthier, more prosperous future for all.”